Symbolic Dynamic Programming for Continuous State and Action MDPs

نویسندگان

  • Zahra Zamani
  • Scott Sanner
  • Cheng Fang
چکیده

Qa := ∫ Qa ⊗ P (xj|b,b ′,x, a,y) dxj [Symbolic Substitution] For all bi in Qa Qa := [ Qa ⊗ P (bi|b,x, a,y) ] |b′i=1 ⊕ [ Qa ⊗ P (bi|b,x, a,y) ] |b′i=0 [Case ⊕] Compute final Q-Value (discount and add reward): Qa := R(b,x, a,y)⊕ (γ ⊗Qa) Note that ∫ f (xj)⊗δ[xj−h(z)]dxj = f (xj){xj/h(z)}where the latter operation indicates that any occurrence of xj in f (x ′ j) is symbolically substituted with the case statement h(z) (Sanner, Delgado, de Barros, UAI 2011).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Symbolic Dynamic Programming for Discrete and Continuous State MDPs

Many real-world decision-theoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DC-MDPs). While previous work has addressed automated decision-theoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DC-MDPs having hyper-rectangular piecewise linear value functions. In this work, we ext...

متن کامل

Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key in...

متن کامل

Symbolic Dynamic Programming

Decision-theoretic planning aims at constructing a policy for acting in an uncertain environment that maximizes an agent’s expected utility along a sequence of steps that solve a goal. For this task, Markov decision processes (MDPs) have become the standard model. However, classical dynamic programming algorithms for solving MDPs require explicit state and action enumeration, which is often imp...

متن کامل

Stochastic Dynamic Programming with Markov Chains for Optimal Sustainable Control of the Forest Sector with Continuous Cover Forestry

We present a stochastic dynamic programming approach with Markov chains for optimal control of the forest sector. The forest is managed via continuous cover forestry and the complete system is sustainable. Forest industry production, logistic solutions and harvest levels are optimized based on the sequentially revealed states of the markets. Adaptive full system optimization is necessary for co...

متن کامل

Bounded Approximate Symbolic Dynamic Programming for Hybrid MDPs

Recent advances in symbolic dynamic programming (SDP) combined with the extended algebraic decision diagram (XADD) data structure have provided exact solutions for mixed discrete and continuous (hybrid) MDPs with piecewise linear dynamics and continuous actions. Since XADD-based exact solutions may grow intractably large for many problems, we propose a bounded error compression technique for XA...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012